Hashing, Load Balancing and Multiple Choice DRAFT∗

نویسنده

  • Udi Wieder
چکیده

Many tasks in computer systems could be abstracted as distributing items into buckets, so that the allocation of items across buckets is as balanced as possible, and furthermore, given an item’s identifier it is possible to determine quickly into which bucket it was assigned. A canonical example is a dictionary data structure, where ‘items’ stands for key-value pairs and ‘buckets’ for memory locations. Another example is a distributed key-value store, where the buckets represent locations in disk or even whole servers. A third example may be a distributed execution engine where items represent processes and buckets computational devices, and so on. A common technique in this domain is the use of a hash-function that maps an item into a relatively short fixed length string. The hash function is then used in some way to associate the item to its bucket. The use of a hash function is typically the first step in the solution and additional algorithmic ideas are required to deal with collisions and the imbalance of hash values. In this manuscript we survey some of these techniques. We focus on multiple choice schemes where items are placed into buckets via the use of several independent hash functions, and typically an item is placed at the least loaded bucket at the time of placement. We analyze the distributions obtained in detail, and show how these ideas could be used to design basic data structures. With respect to data structures we focus on dictionaries, presenting linear probing, cuckoo hashing and many of their variants. ∗feedback: errors, typos and omissions, please send to the author

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance of Hashing-Based Schemes for Internet Load Balancing

Load balancing is a key technique for improving Internet performance. Effective use of load balancing requires good traffic distribution schemes. We study the performance of several hashing schemes for distributing traffic over multiple links while preserving the order of packets within a flow. Although hashing-based load balancing schemes have been proposed in the past, this is the first compr...

متن کامل

Symmetric vs. Asymmetric Multiple-Choice Algorithms

Multiple-choice allocation algorithms have been studied intensively over the last decade. These algorithms have several applications in the areas of load balancing, routing, resource allocation and hashing. The underlying idea is simple and can be explained best in the balls-and-bins model: Instead of assigning balls (jobs, requests, or keys) simply at random to bins (machines, servers, or posi...

متن کامل

Load-balancing and resource-provisioning in large distributed systems. (Équilibrage de charge et répartition de ressources dans les grands systèmes distribués)

The main theme of this thesis is load-balancing in large sparse random graphs. In the computer science context, a load-balancing problem occurs when we have a set of tasks which need to be distributed across multiple resources, and to resolve the load-balancing problem one needs to specify which tasks are going to be handled by which resources. Depending on the context, tasks and resources may ...

متن کامل

Internet Traffic Load Balancing using Dynamic Hashing with Flow Volume

Sending IP packets over multiple parallel links is in extensive use in today’s Internet and its use is growing due to its scalability, reliability and cost-effectiveness. To maximize the efficiency of parallel links, load balancing is necessary among the links, but it may cause the problem of packet reordering. Since packet reordering impairs TCP performance, it is important to reduce the amoun...

متن کامل

Internet Traffic Distribution over Multilink Where High Bandwidth Scalable Switch Port Aggregates Multiple Physical Links

A logical link composed of multiple physical links is in extensive use in today’s Internet and its use is growing due to good scalability, reliability and cost-effectiveness. When IP packets are distributed over such physical links, load unbalancing and packet reordering may occur. Since packet reordering degrades TCP performance, a good traffic distribution method must reduce the amount of reo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017